Attribute-Oriented Knowledge Discovery in Rough Relational Databases

نویسندگان

  • Theresa Beaubouef
  • Frederick E. Petry
چکیده

The rough relational database model was developed for the management of uncertainty in relational databases. A particular type of knowledge discovery, attribute oriented induction of rules from generalized data is described in this paper. Rough Relational Databases In rough sets (Pawlak 1984) an approximation space is defined on some universe U by defining some equivalence relation that partitions the universe into equivalence classes called elementary sets, based on some definition of ‘equivalence’ as it relates to the application domain. Any finite union of these elementary sets is called a definable set. A rough set X U, however, can be defined in terms of the definable sets in terms of its lower (RX) and upper ( R X) approximation regions: RX = {x U |[x]R X} and R X = {x U | [x]R X }. We may refer to RX as the positive region, UR X as the negative region, and R X – RX as the boundary or borderline region of the rough set X. The lower and upper approximation regions, then, allow the distinction between certain and possible inclusion in a rough set. The rough relational database model captures all the essential features of the theory of rough sets including the notion of indiscernibility of elements through the use of equivalence classes and the idea of denoting an indefinable set by its lower and upper approximation regions The attribute domains in this model are partitioned by equivalence relations designated by the database designer or user. Within each domain, a group of values that are considered indiscernible form an equivalence class. The query mechanism uses class equivalence rather than value equality in retrievals. A user may not know the particular attribute value, but might be able to think of a value that is equivalent to the value required. For example, if the query requests "COLOR = 'BROWN'", the result will contain all colors that are defined as equivalent to BROWN, such as TAN, SORREL, or CHESTNUT. Therefore, the exact wording of a query is less critical. The rough relational database (Beaubouef and Petry 2000) retains significant features of the ordinary relational database. Both models represent data as a set of relations containing tuples. The relations themselves are also sets. The tuples of a relation are its elements, and like the elements of Copyright © 2007, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. sets in general, are unordered and nonduplicated. A tuple ti takes the form (di1, di2, ..., dim), where dij is a domain value of a particular domain set Dj. Let P(Di) denote the powerset(Di) . Definition. A rough relation R is a subset of the set cross product P(D1) P(D2) P(Dm). A rough tuple t is any member of R, which implies that it is also a member of P(D1) P(D2) P(Dm). If ti is some arbitrary tuple, then ti = (di1, di2, ..., dim) where dij Dj. Definition. Tuples ti = (di1, di2, ..., dim) and tk = (dk1, dk2, ..., dkm) are redundant if [dij] = [dkj] for all j = 1,..., m, and where [dij] denotes the equivalence class to which dij belongs. Attribute Oriented Generalization Generalization of data is typically performed with utilization of concept hierarchies on an attribute-byattribute basis, applying a separate concept hierarchy for each of the generalized attributes included in the relation of task-relevant data. The basic steps / guidelines for attribute-oriented generalization in an object-oriented database are summarized below (Han and Kamber 2006): 1. An initial query to the database provides the starting generalization relation R which contains the set of data that is relevant to the user’s generalization interest. 2. If there is a large set of distinct values for an attribute but there is no higher level concept provided for the attribute, the attribute should be removed in the generalization process. 3. If there exists a higher-level concept in the concept tree for an attribute value of a tuple, the substitution of the value by its higher-level concept generalizes the tuple. 4. Two generalized tuples may become similar enough to be merged, so we include an added attribute, Count, to keep track of how many objects have been merged to form the current generalized relation. The value of the count of a tuple should be carried to its generalized tuple and the counts should be accumulated when merging identical tuples in generalization. 5. The generalization is controlled by providing levels that specify how far the process should proceed. If the number of distinct values of an attribute in the given relation is larger than the generalization threshold value, further generalization on this attribute should be performed. If the number of tuples in a generalized relation is larger than their generalization threshold value, the generalization should proceed further. We can then extract characteristic rules from generalized data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning in Relational Databases: A Rough Set Approach

Knowledge discovery in databases, or data mining, is an important direction in the development of data and knowledge-based systems. Because of the huge amount of data stored in large numbers of existing databases, and because the amount of data generated in electronic forms is growing rapidly, it is necessary to develop efficient methods to extract knowledge from databases. An attribute-oriente...

متن کامل

DBROUGH: A Rough Set Based Knowledge Discovery System

Abs t rac t . Knowledge discovery in databases, or data mining, is an important objective in the development of dataand knowledge-base systems. An attributeoriented rough set method is developed for knowledge discovery in databases. The method integrates learning from example techniques with rough set theory. An attribute-oriented concept tree ascension technique is first applied in generalizat...

متن کامل

Attribute-oriented Induction in Ob Ject-oriented Databases

Knowledge discovery in databases is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data such that the extracted knowledge may facilitate deductive reasoning and query processing in database systems. This branch of study has been ranked among the most promising topics for database research for the 1990s. Due to the dominating influence of relat...

متن کامل

Knowledge Discovery in Databases: An Attribute-Oriented Approach

Knowledge discovery in databases, or data mining, is an important issue in the development of dataand knowledge-base systems. An attribute-oriented induction method has been developed for knowledge discovery in databases. The method integrates a machine learning paradigm, especially learning-from-examples techniques, with set-oriented database operations and extracts generalized data from actua...

متن کامل

Attribute-Oriented Induction in Relational Databases

It is beneficial as well as challenging to learn knowledge rules from relational databases because of the vast amount of knowledge implied in databases and the large amount of data stored in databases. In this thesis, we develop an attributeoriented induction method to extract characteristic rules and classification rules from relational databases. The method adopts the artificial intelligence ...

متن کامل

Knowledge Discovery in Fuzzy Databases Using Attribute-Oriented Induction

In this paper we analyze an attribute-oriented data induction technique for discovery of generalized knowledge from large data repositories. We employ a fuzzy relational database as the medium carrying the original information, where the lack of precise information about an entity can be reflected via multiple attribute values, and the classical equivalence relation is replaced with relation of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007